AITopics | aggregation operator

Demystifying Oversmoothing in Attention-Based Graph Neural Networks

Neural Information Processing SystemsApr-28-2026, 13:11:06 GMT

Oversmoothing in Graph Neural Networks (GNNs) refers to the phenomenon where increasing network depth leads to homogeneous node representations. While previous work has established that Graph Convolutional Networks (GCNs) exponentially lose expressive power, it remains controversial whether the graph attention mechanism can mitigate oversmoothing. In this work, we provide a definitive answer to this question through a rigorous mathematical analysis, by viewing attention-based GNNs as nonlinear time-varying dynamical systems and incorporating tools and techniques from the theory of products of inhomogeneous matrices and the joint spectral radius. We establish that, contrary to popular belief, the graph attention mechanism cannot prevent oversmoothing and loses expressive power exponentially. The proposed framework extends the existing results on oversmoothing for symmetric GCNs to a significantly broader class of GNN models, including random walk GCNs, Graph Attention Networks (GATs) and (graph) transformers.

artificial intelligence, machine learning, matrix, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

6e4cdfdd909ea4e34bfc85a12774cba0-Paper-Conference.pdf

Neural Information Processing SystemsFeb-13-2026, 21:39:55 GMT

artificial intelligence, machine learning, matrix, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin (0.04)
North America > United States > Texas (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Sequential-Parallel Duality in Prefix Scannable Models

Yau, Morris, Gupta, Sharut, Engelmayer, Valerie, Irie, Kazuki, Jegelka, Stefanie, Andreas, Jacob

arXiv.org Artificial IntelligenceJun-13-2025

Modern neural sequence models are designed to meet the dual mandate of parallelizable training and fast sequential inference. Recent developments have given rise to various models, such as Gated Linear Attention (GLA) and Mamba, that achieve such ``sequential-parallel duality.'' This raises a natural question: can we characterize the full class of neural sequence models that support near-constant-time parallel evaluation and linear-time, constant-space sequential inference? We begin by describing a broad class of such models -- state space models -- as those whose state updates can be computed using the classic parallel prefix scan algorithm with a custom associative aggregation operator. We then define a more general class, Prefix-Scannable Models (PSMs), by relaxing the state aggregation operator to allow arbitrary (potentially non-associative) functions such as softmax attention. This generalization unifies many existing architectures, including element-wise RNNs (e.g., Mamba) and linear transformers (e.g., GLA, Mamba2, mLSTM), while also introducing new models with softmax-like operators that achieve O(1) amortized compute per token and log(N) memory for sequence length N. We empirically evaluate such models on illustrative small-scale language modeling and canonical synthetic tasks, including state tracking and associative recall. Empirically, we find that PSMs retain the expressivity of transformer-based architectures while matching the inference efficiency of state space models -- in some cases exhibiting better length generalization than either.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2506.10918

Country:

Europe (1.00)
North America > United States > Massachusetts (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

packetLSTM: Dynamic LSTM Framework for Streaming Data with Varying Feature Space

Agarwal, Rohit, Naidu, Karaka Prasanth, Horsch, Alexander, Agarwal, Krishna, Prasad, Dilip K.

arXiv.org Artificial IntelligenceOct-22-2024

We study the online learning problem characterized by the varying input feature space of streaming data. Although LSTMs have been employed to effectively capture the temporal nature of streaming data, they cannot handle the dimension-varying streams in an online learning setting. Therefore, we propose a dynamic LSTM-based novel method, called packetLSTM, to model the dimension-varying streams. The packetLSTM's dynamic framework consists of an evolving packet of LSTMs, each dedicated to processing one input feature. Each LSTM retains the local information of its corresponding feature, while a shared common memory consolidates global information. This configuration facilitates continuous learning and mitigates the issue of forgetting, even when certain features are absent for extended time periods. The idea of utilizing one LSTM per feature coupled with a dimension-invariant operator for information aggregation enhances the dynamic nature of packetLSTM. This dynamic nature is evidenced by the model's ability to activate, deactivate, and add new LSTMs as required, thus seamlessly accommodating varying input dimensions. The packetLSTM achieves state-of-the-art results on five datasets, and its underlying principle is extended to other RNN types, like GRU and vanilla RNN.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2410.17394

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Norway > Northern Norway > Troms > Tromsø (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry: Education > Educational Setting (0.75)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On Globular T-Spherical Fuzzy (G-TSF) Sets with Application to G-TSF Multi-Criteria Group Decision-Making

Yang, Miin-Shen, Akhtar, Yasir, Ali, Mehboob

arXiv.org Artificial IntelligenceMar-8-2024

In this paper, we give the concept of Globular T-Spherical Fuzzy (G-TSF) Sets (G-TSFSs) as an innovative extension of T-Spherical Fuzzy Sets (TSFSs) and Circular Spherical Fuzzy Sets (C-SFSs). G-TSFSs represent membership, indeterminacy, and non-membership degrees using a globular/sphere bound that can offer a more accurate portrayal of vague, ambiguous, and imprecise information. By employing a structured representation of data points on a sphere with a specific center and radius, this model enhances decision-making processes by enabling a more comprehensive evaluation of objects within a flexible region. Following the newly defined G-TSFSs, we establish some basic set operations and introduce fundamental algebraic operations for G-TSF Values (G-TSFVs). These operations expand the evaluative capabilities of decision-makers, facilitating more sensitive decision-making processes in a broader region. To quantify a similarity measure (SM) between GTSFVs, the SM is defined based on the radius of G-TSFSs. Additionally, Hamming distance and Euclidean distance are introduced for G-TSFSs. We also present theorems and examples to elucidate computational mechanisms. Furthermore, we give the G-TSF Weighted Average (G-TSFWA) and G-TSF Weighted Geometric (G-TSFWG) operators. Leveraging our proposed SM, a Multi-Criteria Group Decision-Making (MCGDM) scheme for G-TSFSs, named G-TSF MCGDM (G-TSFMCGDM), is developed to address group decision-making problems. The applicability and effectiveness of the proposed G-TSFMCGDM method are demonstrated by applying it to solve the selection problem of the best venue for professional development training sessions in a firm. The analysis results affirm the suitability and utility of the proposed method for resolving MCGDM problems, establishing its effectiveness in practical decision-making scenarios.

application, g-tsfss, g-tsfv, (12 more...)

arXiv.org Artificial Intelligence

2403.0701

Country:

Asia > Pakistan > Gilgit-Baltistan > Gilgit (0.04)
Asia > Taiwan (0.04)

Genre: Research Report > Promising Solution (0.46)

Industry: Education > Educational Setting (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.89)

Add feedback

Demystifying Oversmoothing in Attention-Based Graph Neural Networks

Wu, Xinyi, Ajorlou, Amir, Wu, Zihui, Jadbabaie, Ali

arXiv.org Machine LearningOct-5-2023

Oversmoothing in Graph Neural Networks (GNNs) refers to the phenomenon where increasing network depth leads to homogeneous node representations. While previous work has established that Graph Convolutional Networks (GCNs) exponentially lose expressive power, it remains controversial whether the graph attention mechanism can mitigate oversmoothing. In this work, we provide a definitive answer to this question through a rigorous mathematical analysis, by viewing attention-based GNNs as nonlinear time-varying dynamical systems and incorporating tools and techniques from the theory of products of inhomogeneous matrices and the joint spectral radius. We establish that, contrary to popular belief, the graph attention mechanism cannot prevent oversmoothing and loses expressive power exponentially. The proposed framework extends the existing results on oversmoothing for symmetric GCNs to a significantly broader class of GNN models, including random walk GCNs, Graph Attention Networks (GATs) and (graph) transformers.

artificial intelligence, machine learning, matrix, (17 more...)

arXiv.org Machine Learning

2305.16102

Country:

North America > United States > Wisconsin (0.05)
North America > United States > Texas (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Principles for Initialization and Architecture Selection in Graph Neural Networks with ReLU Activations

DeZoort, Gage, Hanin, Boris

arXiv.org Artificial IntelligenceJun-20-2023

This article derives and validates three principles for initialization and architecture selection in finite width graph neural networks (GNNs) with ReLU activations. First, we theoretically derive what is essentially the unique generalization to ReLU GNNs of the well-known He-initialization. Our initialization scheme guarantees that the average scale of network outputs and gradients remains order one at initialization. Second, we prove in finite width vanilla ReLU GNNs that oversmoothing is unavoidable at large depth when using fixed aggregation operator, regardless of initialization. We then prove that using residual aggregation operators, obtained by interpolating a fixed aggregation operator with the identity, provably alleviates oversmoothing at initialization. Finally, we show that the common practice of using residual connections with a fixup-type initialization provably avoids correlation collapse in final layer features at initialization. Through ablation studies we find that using the correct initialization, residual aggregation operators, and residual connections in the forward pass significantly and reliably speeds up early training dynamics in deep ReLU GNNs on a variety of tasks.

aggregation operator, gnn, initialization, (15 more...)

arXiv.org Artificial Intelligence

2306.11668

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

The Joint Weighted Average (JWA) Operator

Broomell, Stephen B., Wagner, Christian

arXiv.org Artificial IntelligenceFeb-23-2023

Information aggregation is a vital tool for human and machine decision making, especially in the presence of noise and uncertainty. Traditionally, approaches to aggregation broadly diverge into two categories, those which attribute a worth or weight to information sources and those which attribute said worth to the evidence arising from said sources. The latter is pervasive in particular in the physical sciences, underpinning linear order statistics and enabling non-linear aggregation. The former is popular in the social sciences, providing interpretable insight on the sources. Thus far, limited work has sought to integrate both approaches, applying either approach to a different degree. In this paper, we put forward an approach which integrates--rather than partially applies--both approaches, resulting in a novel joint weighted averaging operator. We show how this operator provides a systematic approach to integrating a priori beliefs about the worth of both source and evidence by leveraging compositional geometry--producing results unachievable by traditional operators. We conclude and highlight the potential of the operator across disciplines, from machine learning to psychology.

artificial intelligence, machine learning, operator, (17 more...)

arXiv.org Artificial Intelligence

2302.11885

Country:

Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.14)
North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
Asia > Singapore > Central Region > Singapore (0.04)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)

Add feedback

Circular Pythagorean fuzzy sets and applications to multi-criteria decision making

Bozyiğit, Mahmut Can, Olgun, Murat, Ünver, Mehmet

arXiv.org Artificial IntelligenceOct-27-2022

In this paper, we introduce the concept of circular Pythagorean fuzzy set (value) (C-PFS(V)) as a new generalization of both circular intuitionistic fuzzy sets (C-IFSs) proposed by Atannassov and Pythagorean fuzzy sets (PFSs) proposed by Yager. A circular Pythagorean fuzzy set is represented by a circle that represents the membership degree and the non-membership degree and whose center consists of non-negative real numbers $\mu$ and $\nu$ with the condition $\mu^2+\nu^2\leq 1$. A C-PFS models the fuzziness of the uncertain information more properly thanks to its structure that allows modelling the information with points of a circle of a certain center and a radius. Therefore, a C-PFS lets decision makers to evaluate objects in a larger and more flexible region and thus more sensitive decisions can be made. After defining the concept of C-PFS we define some fundamental set operations between C-PFSs and propose some algebraic operations between C-PFVs via general $t$-norms and $t$-conorms. By utilizing these algebraic operations, we introduce some weighted aggregation operators to transform input values represented by C-PFVs to a single output value. Then to determine the degree of similarity between C-PFVs we define a cosine similarity measure based on radius. Furthermore, we develop a method to transform a collection of Pythagorean fuzzy values to a PFS. Finally, a method is given to solve multi-criteria decision making problems in circular Pythagorean fuzzy environment and the proposed method is practiced to a problem about selecting the best photovoltaic cell from the literature. We also study the comparison analysis and time complexity of the proposed method.

aggregation operator, artificial intelligence, fuzzy logic, (16 more...)

arXiv.org Artificial Intelligence

2210.15483

Country:

Asia > Middle East > Republic of Türkiye > Ankara Province > Ankara (0.05)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands > South Holland > Dordrecht (0.04)

Genre: Research Report (0.50)

Industry: Energy > Renewable > Solar (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)

Add feedback

Admissibility in Strength-based Argumentation: Complexity and Algorithms (Extended Version with Proofs)

Bacquey, Yohann, Mailly, Jean-Guy, Moraitis, Pavlos, Rossit, Julien

arXiv.org Artificial IntelligenceJul-5-2022

Recently, Strength-based Argumentation Frameworks (StrAFs) have been proposed to model situations where some quantitative strength is associated with arguments. In this setting, the notion of accrual corresponds to sets of arguments that collectively attack an argument. Some semantics have already been defined, which are sensitive to the existence of accruals that collectively defeat their target, while their individual elements cannot. However, until now, only the surface of this framework and semantics have been studied. Indeed, the existing literature focuses on the adaptation of the stable semantics to StrAFs. In this paper, we push forward the study and investigate the adaptation of admissibility-based semantics. Especially, we show that the strong admissibility defined in the literature does not satisfy a desirable property, namely Dung's fundamental lemma. We therefore propose an alternative definition that induces semantics that behave as expected. We then study computational issues for these new semantics, in particular we show that complexity of reasoning is similar to the complexity of the corresponding decision problems for standard argumentation frameworks in almost all cases. We then propose a translation in pseudo-Boolean constraints for computing (strong and weak) extensions. We conclude with an experimental evaluation of our approach which shows in particular that it scales up well for solving the problem of providing one extension as well as enumerating them all.

argument, constraint, straf, (16 more...)

arXiv.org Artificial Intelligence

2207.02258

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)

Add feedback

Filters

Collaborating Authors

aggregation operator

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Demystifying Oversmoothing in Attention-Based Graph Neural Networks

6e4cdfdd909ea4e34bfc85a12774cba0-Paper-Conference.pdf

Sequential-Parallel Duality in Prefix Scannable Models

packetLSTM: Dynamic LSTM Framework for Streaming Data with Varying Feature Space

On Globular T-Spherical Fuzzy (G-TSF) Sets with Application to G-TSF Multi-Criteria Group Decision-Making

Demystifying Oversmoothing in Attention-Based Graph Neural Networks

Principles for Initialization and Architecture Selection in Graph Neural Networks with ReLU Activations

The Joint Weighted Average (JWA) Operator

Circular Pythagorean fuzzy sets and applications to multi-criteria decision making

Admissibility in Strength-based Argumentation: Complexity and Algorithms (Extended Version with Proofs)